arXiv:1503.04957vl [cs.SE] 17 Mar 2015 


Conformance Checking Based on Multi-Perspective 
Declarative Process Models 


A. Burattin®-, F. M. Maggi*^, A. Sperduti*^ 

University of Padua, Italy 
University of Tartu, Estonia 


Abstract 

Process mining is a family of techniques that aim at analyzing business pro¬ 
cess execution data recorded in event logs. Conformance checking is a branch 
of this discipline embracing approaches for verifying whether the behavior of a 
process, as recorded in a log, is in line with some expected behaviors provided 
in the form of a process model. The majority of these approaches require the 
input process model to be procedural (e.g., a Petri net). However, in turbulent 
environments, characterized by high variability, the process behavior is less sta¬ 
ble and predictable. In these environments, procedural process models are less 
suitable to describe a business process. Declarative specifications, working in an 
open world assumption, allow the modeler to express several possible execution 
paths as a compact set of constraints. Any process execution that does not con¬ 
tradict these constraints is allowed. One of the open challenges in the context 
of conformance checking with declarative models is the capability of supporting 
multi-perspective specifications. In this paper, we close this gap by providing a 
framework for conformance checking based on MP-Declare, a multi-perspective 
version of the declarative process modeling language Declare. The approach has 
been implemented in the process mining tool ProM and has been experimented 
in three real life case studies. 

Keywords: Process Mining, Conformance Checking, Linear Temporal Logic, 
Business Constraints, Declare 


1. Introduction 

The need to develop information systems able to fully support business pro¬ 
cesses of companies, and organizations in general, is becoming more and more 
urgent because of the fast pace of change in markets. Such dynamic markets 
impose frequent modifications and updates to business processes, leading to a 
constant decrease, in terms of temporal span, to the life-cycle of a business 
process definition. In this context, one very important functionality that any 
process-aware information system should be able to support is conformance 
checking, i.e., the ability to verify whether the actual flow of work is conformant 
with the intended business process model. This is especially true in the case 
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of very complex processes, where the adoption of an imperative formalism to 
represent it, such as Petri Nets [T] or BPM Notation [2], may lead to so much 
intricate workflows (so called “spaghetti”-like workflows) to become basically 
impossible to even properly visualize the process for human inspection. 

Early works in conformance checking (e.g., sms]) mainly focused on the 
control-flow perspective in the context of imperative models, i.e., on the func¬ 
tional dependencies among performed activities/tasks in the process, while ab¬ 
stracting from time constraints, data dependencies, and resource assignments. 
These works were mainly based on replaying the log on the model to compute, 
according to the proposed approach, the fraction of events or traces in the log 
that can be replayed by the model. An evolution of these approaches is given 
by align-based approaches, where the conformance checking is performed by 
aligning both the modeled behavior and the behavior observed in the log (e.g. 
i)- Only recently, approaches able to deal with multiple perspectives have 
been developed [30, as well as approaches that aim at being computationally 
efficient via a problem decomposition strategy jsmniiiiKii]- 

In the case in which the process in consideration is complex, however, it 
is much better to use a declarative formalism, such as Declare nans US] , to 
represent a set of constraints that must be satisfied throughout the process 
execution. In this way, the “spaghetti”-like workflows are avoided, and the 
obtained model is flexible enough to allow all behaviors that do not violate the 
defined constraints. Conformance checking approaches based on the control-flow 
perspective have been defined for declarative models as well (e.g. nnumuHi). 
More recently the additional data perspective has been considered in [El |20] , 
even if in these works the data perspective is not fully integrated with the control 
flow perspective. Efficient and fully integrated multi-perspective conformance 
checking proposals for declarative models, however, are still missing. 

In this paper, we aim at closing this gap by proposing a multi-perspective 
approach based on Declare where it is possible to define multi-perspective con¬ 
straints jointly considering data, temporal, and control flow perspectives. In 
order to allow that, we formally define Multi-Perspective Declare (MP-Declare), 
an augmented version of Declare where, thanks to the use of Metric Pirst-Order 
Linear Temporal Logic, it is possible to define activation, correlation, and time 
conditions to build constraints over traces. 

A nice feature of MP-Declare is that, by construction, it allows the user to 
efficiently perform conformance checking over event logs. In fact, we show that it 
is possible to define a conformance checking algorithmic framework operating on 
constraint templates, that is linear in the number of traces, constraints, and in 
the number of events of each trace. Conformance checking for a specific template 
is then obtained via definition of template-dependent procedures within the 
framework, whose time complexity depends on the actual template. Overall, 
however, the time complexity is upper bounded in the worst case by a quadratic 
function. 

We assess the validity of the proposed approach both on artificial and real 
event logs. Controlled artificial data, involving logs containing up to 5 million 
events, are used to prove the scalability of the proposed approach, while real 
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event logs generated by three real business processes are used to demonstrate 
the expressivity and flexibility of constraints defined via MP-Declare. 

2. Related Work 

The scientific literature reports several works in the field of conformance 
checking m- Typically, the term conformance checking refers to the compar¬ 
ison of observed behaviors - as recorded in an event log - with respect to a 
process model. In the past, most of the conformance checking techniques were 
based on procedural models. State of the art examples of these approaches are 
reported in [BEaillllE]. 

In recent years, an increasing number of researchers are focusing on the con¬ 
formance checking with respect to declarative models. For example, in |16j . 
an approach for compliance checking with respect to reactive business rules is 
proposed. Rules, expressed using Condec are mapped to Abductive Logic 
Programming, and Prolog is used to perform the validation. The approach 
has been extended in by mapping constraints to LTL, and evaluating them 
using automata. The entire work has been contextualized into the service chore¬ 
ography scenario. 

Runtime monitoring for compliance checking has been studied also based 
on MFOTL, as reported in [24l [25]. In these cases, the focus is on security 
policy monitoring. On the one side the authors try to enforce security policies, 
on the other they perform monitoring. In order to enforce security policies, it 
is necessary to distinguish between controllable and observable activities and, 
under specific circumstances, terminate the systems in order to prevent policy 
violations. Concerning the monitoring, authors identified fragments of the used 
logic, to describe security policies insensitive with respect to the ordering of 
actions with equal timestamps. The authors assume to perform monitoring in 
a distributed systems, which have synchronized clocks with limited precision. 

Another application domain that researchers used to assess the applicabil¬ 
ity of conformance checking techniques is the medical domain. In particular, 
Grando et al. [26l [27] used Declare to model medical guidelines and to provide 
semantic (i.e., ontology-based) conformance checking measures. However, in 
this analysis neither data nor time perspectives are taken into account. 

In |18j . the authors report an approach that can be used to evaluate the 
conformance of a log with respect to a Declare model. In particular, their 
algorithms compute, for each trace, whether a Declare constraint is violated 
or fulfilled. Using these statistics the approach allows the user to evaluate the 
“healthiness” of the log. The approach is based on the conversion of Declare 
constraints into automata and, using a so-called “activation tree”, it is able to 
identify violations and fulfillments. The approach described in this work does 
not take into account the data and time perspective, but only the control-flow 
is analyzed. 

The work described in [551125] consists in converting a Declare model into 
an automaton and perform conformance checking of a log with respect to the 
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generated automaton. The conformance checking approach is based on the 
concept of “alignment” and as a result of the analysis each trace is converted 
into the most similar trace that the model accepts. 

In a recent work, reported in [20j . the data perspective for conformance 
checking with Declare is expressed in terms of conditions on global variables 
disconnected from the specific Declare constraints expressing the control flow. 
This work does not take the temporal perspective into account. In contrast, we 
provide a formal semantics in which the data perspective, the temporal perspec¬ 
tive and the control flow are connected with each others. 

3. Preliminaries 

In this section, we present the fundamental concepts required to understand 
the rest of the paper. 

3.1. Process Mining and XES 

The basic idea behind process mining is to discover, monitor and improve 
processes by extracting knowledge from data that is available in today’s systems 
[5]. The starting point for process mining is an event log. XES (extensible Event 
Stream) [5DII5T] has been developed as the standard for storing, exchanging and 
analyzing event logs. 

Each event in a log refers to an activity (i.e., a well-defined step in some 
process) and is related to a particular case (i.e., a process instance). The events 
belonging to a case are ordered with respect to their execution times. Hence, a 
case (i.e., a trace) can be viewed as a sequence of events. Event logs may store 
additional information about events such as the resource (i.e., person or device) 
executing or initiating the activity, the timestamp of the event, or data elements 
recorded with the event. In XES, data elements can be event attributes, i.e., 
data produced by the activities of a business process and case attributes, namely 
data that are associated to a whole process instance. In this paper, we assume 
that all attributes are globally visible and can be accessed/manipulated by all 
activity instances executed inside the case. 

3.2. Metric First Order Temporal Logic 

In this paper, we use Metric First Order Temporal Logic (MFOTL) first 
introduced in [^. MFOTL extends propositional metric temporal logic [3S] 
to merge the expressivity of first-order logic together with the MTL temporal 
modalities. We deal with a fragment of MFOTL where all traces are finite. 

In the following, we call “structure” a triple D = (A,ct, l). A is the domain 
of the structure, i.e., an arbitrary set. a is the signature of the structure, i.e., a 
triple a = {C, R, a), where C is a set of constant symbols, i? is a set of relational 
symbols, and a is a function that specify the arity of each relational symbol, t 
is the interpretation function of the structure that assigns a meaning to all the 
symbols in cr over the domain A. 
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Definition 1 (Timed temporal structure). A timed temporal structure over the 
signature a = (C, R, a) is a pair {D, r) where D is a finite sequence of structures 
D = {Di, ..., Dn) and r = (ti, ..., t„) is a finite sequence of timestamps with 
Ti S 1N0 D is assumed to have constant domains, i.e., for all 

1 < i < n. Each constant symbol in C has an interpretation that does not vary 
over the time. The sequence of timestamps r is monotonically increasing, i.e., 

Ti < Ti+i, for all 1 < i < n. 

We indicate with / = [a, b) an interval, where a S IN and 5 G IN U {oo}, and 
with V a set of variables. To express MFOTL formulas, we use the syntax: 

Definition 2 (MFOTL Syntax). Formulas of MFOTL over a signature a = 
(C, R, a) are given by the grammar 

(j) ::= « O I r{ti,.. .,ta(r)) \ \ 4>i A cj )2 \ \ Xif \ | Yif) \ ((>iS/((i 2 

where (j),4>i,(j)2 &MFOTL, I = [a,b) is an interval, r is an element of R, x 
ranges over V, and ti,t 2 ,... belong to V U C. 

A valuation is a mapping u : ^ A. With abuse of notation, if c is a 
constant symbol in C, we say that v(c) = c. For a valuation v, a variable x € V, 
and d G A, v[x/d] is the valuation that maps x to d and leaves unaltered the 
valuation of the other variables. 

Definition 3 (MFOTL Semantics). Given {D,t) a timed temporal structure 
over the signature a = {C,R,a) with D = (Di,..., £>„), r = (ri,...,r„), (j) 
a formula over S, v a valuation, and 1 < i < n, we define {D,T,v,i) \= 4> as 
follows: 


{D, r, u, i) 1= t « t' 
{D,T,V,i) \=r{ti,...,ta(r)) 
{D,T,v,i) \= 
{D,T,v,i) \= (j)i A<p 2 
(D, T, V, i) \= 3x.(j)i 
{D,T,v,i) N Yifii 
{D,T,v,i) 1= Xifii 
{D,T,v,i) 1= (^iS /^2 


{D,T,v,i) \= (^iU/^2 


iff vif) = v{t') 

iff {v{ti),...,v{ta(r)))) & 

iff {D,T,v,i))^ 

iff {D, T, V, i) 1= (j)i and {D, r, v, i) 1= ((>2 
iff {D,T,v[x/d],i) \= (pi, for some d G A 
iff i > l,Ti — Ti-i G I, and {D, T,v,i — 1) \= (pi 
iff i < n, Ti+i — Ti G I and {D, T,v,i + 1) \= (pi 
iff for some j <i,Ti- Tj G I, 

{D,T,v,j) 1= p 2 ond {D,T,v,k) \= (pi 
for all k G [j + 1, t + 1) 
iff for some j > i, r, - r* G /, 

{D,T,v,j) \= p 2 and {D,T,v,k) \= (pi 
for all k G [j, i) 


We add syntactic sugar for the normal connectives, such as true = 3x.x « 
X, (pi V p2 = A ^(p2), yx.(p = -( 3 x.^p (pi ^ (p2 = {^(pi) V (p2 and 


^Note that every timestamp available in a XES log can be translated into an integer. 
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Table 1: Semantics for some Declare templates. 


Template 

LTL semantics 

Activation 

responded existence 

G{A^ {OBWFB)) 

A 

response 

G{A-^FB) 

A 

alternate response 

G(A ^ X(^AUB)) 

A 

chain response 

G(A ^ XB) 

A 

precedence 

G{B OA) 

B 

alternate precedence 

G(B ^ Yi^BSA)) 

B 

chain precedence 

G(B ^ YA) 

B 

not responded existence 

G(A-> -n{OB\/FB)) 

A 

not response 

gU ^ -^fb) 

A 

not precedence 

G(B ^ -OA) 

B 

not chain response 

gU ^ -XB) 

A 

not chain precedence 

G(B ^ -YA) 

B 


</i’i ^ <^2 = {4’i ^ 2 ) A {(j )2 —t 4’i)- We also add temporal syntactic sugar, 

Flip = trueU/i/; (timed future operator), Giip = ^(Fj{-^iIj)) (timed globally op¬ 
erator), O/'f/j = trueS/'0 (timed once operator) and Hiip = (timed 

historically operator). The non-metric variants of the temporal operators are 
obtained by specifying I = [0, 00 ). 

3.3. Declare 

Declare is a declarative process modeling language originally introduced by 
Pesic and van der Aalst in [mEiiii]. Instead of explicitly specifying the flow of 
the interactions among process activities, Declare describes a set of constraints 
that must be satished throughout the process execution. The possible orderings 
of activities are implicitly specified by constraints and anything that does not 
violate them is possible during execution. In comparison with procedural ap¬ 
proaches that produce “closed” models, i.e., all that is not explicitly specified 
is forbidden. Declare models are “open” and tend to offer more possibilities for 
the execution. In this way, Declare enjoys flexibility and is very suitable for 
highly dynamic processes characterized by high complexity and variability due 
to the turbulence and the changeability of their execution environments. 

A Declare model consists of a set of constraints applied to activities. Con¬ 
straints, in turn, are based on templates. Templates are patterns that define 
parameterized classes of properties, and constraints are their concrete instan¬ 
tiations (we indicate template parameters with capital letters and concrete ac¬ 
tivities in their instantiations with lower case letters). They have a graphical 
representation understandable to the user and their semantics can be formalized 
using different logics [34], the main one being LTL over hnite traces, making 
them verifiable and executable. Each constraint inherits the graphical repre¬ 
sentation and semantics from its template. Table summarizes some Declare 
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templates (the reader can refer to [13] for a full description of the language). 

The responded existence template specifies that if A occurs, then B should 
also occur (either before or after A). The response template specifies that when 
A occurs, then B should eventually occur after A. The precedence template 
indicates that B should occur only if A has occurred before. Templates alter¬ 
nate response and alternate precedence strengthen the response and precedence 
templates respectively by specifying that activities must alternate without rep¬ 
etitions in between. Even stronger ordering relations are specified by templates 
chain response and chain precedence. These templates require that the occur¬ 
rences of A and B are next to each other. Declare also includes some negative 
constraints to explicitly forbid the execution of activities. The not responded ex¬ 
istence template indicates that if A occurs in a process instance, B cannot occur 
in the same instance. According to the not response template any occurrence 
of A cannot be eventually followed by B, whereas the not precedence template 
requires that any occurrence of B is not preceded by A. Finally, according to 
the not chain response and not chain precedence, A and B cannot occur one 
immediately after the other. 

The major benefit of using templates is that analysts do not have to be 
aware of the underlying logic-based formalization to understand the models. 
They work with the graphical representation of templates, while the underly¬ 
ing formulas remain hidden. Declare is very suitable for specifying compliance 
models that are used to check if the behavior of a system complies with desired 
regulations. The compliance model defines the constraints related to a single 
process instance, and the overall expectation is that all instances comply with 
the model. Consider, for example, the response constraint G(a —>■ F&). This 
constraint indicates that if a occurs, b must eventually follow. Therefore, this 
constraint is satisfied for traces such as ti = {a,a,b,c), t 2 = {b,h,c,d), and 
ta = {a,b,c,b), but not for t 4 = {a,b,a,c) because, in this case, the second 
instance of a is not followed by a b. Note that, in t 2 , the considered response 
constraint is satisfied in a trivial way because a never occurs. In this case, we 
say that the constraint is vacuously satisfied [33]. In m, the authors introduce 
the notion of behavioral vacuity detection according to which a constraint is 
non-vacuously satisfied in a trace when it is activated in that trace. An activa¬ 
tion of a constraint in a trace is an event whose occurrence imposes, because 
of that constraint, some obligations on other events (targets) in the same trace. 
For example, a is an activation for the response constraint G(a —>■ F&) and b 
is a target, because the execution of a forces b to be executed, eventually. In 
Table for each template the corresponding activation is specified. 

An activation of a constraint can be a fulfillment or a violation for that con¬ 
straint. When a trace is perfectly compliant with respect to a constraint, every 
activation of the constraint in the trace leads to a fulfillment. Consider, again, 
the response constraint G(a —>■ Fb). In trace ti, the constraint is activated and 
fulfilled twice, whereas, in trace ta, the same constraint is activated and fulfilled 
only once. On the other hand, when a trace is not compliant with respect to a 
constraint, an activation of the constraint in the trace can lead to a fulfillment 
but also to a violation (at least one activation leads to a violation). In trace t 4 . 
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Figure 1: Fulfillment and violation scenarios for the response constraint between 
activities A and B. (a) reports a typical fulfillment scenario. In (b), the violation 
is due to the violation of the correlation condition In (c), the violation is 
due to the violation of the time condition 


for example, the response constraint G(a —> F6) is activated twice, but the first 
activation leads to a fulfillment (eventually b occurs) and the second activation 
leads to a violation {b does not occur subsequently). An algorithm to discrimi¬ 
nate between fulfillments and violations for a constraint in a trace is presented 
in [18]. Table reports the activations for the main Declare templates. 

In |18j . the authors define two metrics to measure the conformance of an 
event log with respect to a constraint in terms of violations and fulfillments, 
called violation ratio and fulfillment ratio of the constraint in the log. These 
metrics are valued 0 if the log contains no activations of the considered con¬ 
straint. Otherwise, they are evaluated as the percentage of violations and ful¬ 
fillments of the constraint over the total number of activations. 

Tools implementing process mining approaches based on Declare are pre¬ 
sented in |36j . The tools are implemented as plug-ins of the process mining 
framework ProM. 


4. MFOTL Semantics for Multi-Perspective Business Constraints 

In this section, we introduce a multi-perspective version of Declare (MP- 
Declare). The version is similar to the ones in [ST] |38|, but we enrich it by 
allowing both time and data perspective. To do this, we use Metric First- 
Order Linear Temporal Logic (MFOTL). While many reasoning tasks are clearly 
undecidable for MFOTL, this logic is appropriate to unambiguously describe the 
semantics of the MP-Declare constraints we can use for conformance checking 
in our proposed algorithms. 
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Figure 2: Fulfillment and violation scenarios for the alternate response con¬ 
straint between activities A and B. (a) reports a typical fulfillment scenario. In 
(b), the violation is due to the violation of the correlation condition ipc- In (c), 
the violation is due to the violation of the time condition ipr- The activation 
in (d) is a fulfillment because the second occurrence of A does not satisfy the 
activation condition. In contrast, (e) reports a violation since, in this case, the 
second occurrence of A satisfies the activation condition. 


To define the new semantics for Declare, we have to contextualize the defi¬ 
nitions given in Section [3.2| in XES. Consider, for example, that the execution 
of an activity pay is recorded in an event log and, after the execution of pay at 
timestamp r^, the attributes originator^ amount, and z have values John, 100, 
and July. In this case, the valuation of {activityName, originator, amount, z) 
is {pay, John, 100, July) in r^. Considering that in XES, by definition, the ac¬ 
tivity name is a special attribute always available, if {pay, John, 100, July) is 
the valuation of {activityName, originator, amount, z), we say that, when pay 
occurs, two special relations are valid event{pay) and Ppay{ John, 100, July). 
In the following, we identify event{pay) with the event itself pay and we call 
{John, 100, July), the payload of pay. 

The semantics for MP-Declare is shown in Table [21 Note that all the tem¬ 
plates here considered have two parameters, an activation and a target (see also 
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Figure 3: Fulfillment and violation scenarios for the chain response template 
between activities A and B. (a) reports a typical fulfillment scenario. Note 
that, in this case, the two events are contiguous. In (b), the violation is due to 
the violation of the correlation condition ipc- In (c), the violation is due to the 
violation of the time condition ip-r- 


Table [^ . As an example, we consider the response constraint “activity pay is 
always eventually followed by activity get discount” having pay as activation 
and get discount as target. The timed semantics of Declare, introduced in 
is extended by requiring two additional conditions on data, i.e., the activation 
condition ipa and the correlation condition ip a- The activation condition is a 
relation (over the variables corresponding to the global attributes in the event 
log) that must be valid when the activation occurs. If the activation condition 
does not hold the constraint is not activated. In the case of the response tem¬ 
plate the activation condition has the form pa{x) A ra{x), meaning that when A 
occurs with payload x, the relation over x must hold. For example, we can 
say that whenever pay occurs and client type is gold then eventually get discount 
must follow. In case pay occurs but client type is not gold the constraint is not 
activated. The correlation condition is a relation that must be valid when the 
target occurs. It has the form psiy) A rc(x, y), where Xc is a relation involving, 
again, variables corresponding to the (global) attributes in the event log but, in 
this case, relating the valuation of the attributes corresponding to the payload 
of A and the valuation of the attributes corresponding to the payload of B. 
In our example, we can say that whenever pay occurs and client type is gold 
then eventually get discount must follow and the due amount corresponding to 
activity get discount must be lower than the one corresponding to activity pay. 
In the following, with abuse of notation we specify the interval characterizing 
the time perspective of a MP-Declare constraint (7 = [a, h)) with ipr- 
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Table 2: Semantics for MP-Declare constraints. 


Template 

MFOTL Semantics 



responded existence 

G(Vai.((A A ^pa{x)) —>■ 

(0/(B A By.(pa{x, y))\/ Fi{B A 

y))))) 

response 

alternate response 
chain response 

G(Va:.((A A ^pa{x)) — !• 
G(Vai.((^ A ipa{x)) -A- 
G(Vai.((Al A (pa{x)) -A- 

Fi(B A ^y.ipa(x,y)))) 

X(^(A A ipa(x))TJ:{B A ^y.ip„(x,y))))) 
Xi{B A ^y.ipaix, y))) 


precedence 
alternate precedence 
chain precedence 

G(yx.{{B A tpa{x)) -A- 
G(VfC.((S A <^a{x)) -A 
G{\/x.IIb a tpaix)) -A- 

0/(A A 3y.(pa(x,y))) 

Y(^(B A ifia(x))Si(A A By.ip„{x, y)))) 
Y/(A A ^y.ip„(x, y))) 


not responded existence 
not response 
not precedence 
not chain response 
not chain precedence 

G(Vai.((j4 A (pa{x)) -A- 

G(VfC.((^ A tpa{x)) -A- 

G{\/x.IIb a t 

G(Vai.((Al A ipa{x)) -A- 
G(Va:.((B A 

^(Oi(B A ^y.ip„(x, y)) V F/(B A ^y.tpai'. 
^F/(B A ^y.ipa(x, y)))) 

-^Oj{A A ^y.ip,,{x, y))) 

^X/(B A 3y.ipa{x, y))) 

^Yf(A A 3y.ipa(x, y))) 

X,V))))) 


Graphical representations of three MP-Declare templates are reported in 
Figures Ei and In particular, these hgures report the semantics for re¬ 
sponse, alternate response and chain response constraints. Each figure shows 
possible scenarios of violations and fulhllments for the corresponding constraint. 
A scenario is described reporting events as rounded circles. Each circle is as¬ 
sociated to an activity {A, B, ot C) and a data condition (either an activation 
condition (pa or a correlation condition ipc). The time condition (pr is reported 
above the horizontal curly bracket. Crossed data or time conditions indicate 
violated conditions. Red circles indicate events that are violations, green circles 
indicate fulfillments. 

The response constraint in Eigure indicates that, if A occurs at time ta 
with ipa holding true, B must occur at some point tb G [ta + ci,ta + b) with 
(fc holding true. The alternate response constraint in Figure specifies that, 
if A occurs at time ta with pa holding true, B must occur at some point 
Tb G [ta + a, 7 ‘a + b) with pc holding true. A is not allowed in the interval 
[taiTb] if Pa is true. Any event different from A is allowed and, also, A is 
allowed if pa is false. The chain response constraint in Figure [^indicates that, 
if A occurs at time ta with pa holding true, B must occur next at some point 
Tb G [ta + a, Ta + b) with pc holding true. 

5. Conformance Checking Algorithms 

As stated in the previous section, with MP-Declare, it is possible to ex¬ 
press Declare constraints taking into account also the temporal and the data 
perspectives. As an example, it is possible to express constraints like: 

• activity A must occur between 10 and 11 hours before activity B] 

• if activity A writes a variable x with value < 1000, then B must occur after 
two days. 
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Therefore, using this language, it is possible to define multi-perspective compli¬ 
ance models that can be used for several purposes like, for example, for repre¬ 
senting Service Level Agreements (SLAs). In this context, it would be useful to 
provide the user with techniques to detect whether cases are actually fulfilling 
the required set of constraints or not. In this section, we present algorithms to 
check the conformance of an event log with respect to a MP-Declare model. 

The proposed approach for the conformance checking of MP-Declare con¬ 
straints is based on several procedures. The main component is described in 
the CheckLogConf ormance procedure, reported in Algorithmic This algorithm 
requires as input a log and a MP-Declare model (i.e., a set of MP-Declare con¬ 
straints). Then, it iterates through all traces and, for each constraint, it com¬ 
putes the violations and the fulfillments by calling the CheckTraceConf ormance 
procedure. CheckTraceConf ormance, described in Algorithmic takes as input a 
trace and a constraint and generates the set of violating and fulfilling events for 
that specific constraint in that specihc trace. The basic idea of this procedure is 
to iterate through all the events of the trace and, for each of them, call specific 
template-dependent operations (lines 5-11). 


Algorithm 1: CheckLogConformauice 
Input: Log: an event log 
Model: a model 

Output: A set of violating and fulfilling traces/constraints 

1 Let fulfill and viol be maps that, given a trace and a constraint, return 
the set of fulfilling and violating events 


2 

3 

4 

5 

6 

7 

8 
9 


foreach trace G Log do 

foreach constr G Model do 

viol, fulfill G- CheckTraceConf ormance(trace, constr) 
// Algorithmic 

viol [trac^lconstr] G- viol 
fulfill [trace] [constr] G- fulfill 

end 

end 

return viol, fulfill 


The described algorithms might be seen as a general “framework” that can 
be used for conformance checking with respect to different templates. Each 
template that needs to be verified must properly define the following required 
operations: 

• opening: this procedure is called once per trace, before starting the anal¬ 
ysis of the first event of the trace; 

• fulfillments: this procedure is called for each event of the trace and is 
supposed to return the set of fulfillments that have been observed so far; 
modifications to the set of activations are allowed as well; 
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Algorithm 2: CheckTraceConf ormance 

Input: trace: a trace 

c = {tempi, A, T, (pa, Pc, Vt)' a constraint 
Output: Set of violating and fulfilling events 

1 pending ^ 0 

2 fulfillments •<— 0 

3 violations <— 0 


/* All the following calls are allowed to make side effects 
on the provided paramieters */ 

4 tempi.opening^) /* Opening template operations */ 

5 foreach e G trace do 

6 templ.fulfillment{e, trace, pending, fulfillments, T, (pa, fc, ^t) 

7 templ.violation{e, trace, pending, violations, T, pc, ^t) 

8 tempi.activation{e. A, pending, pa) 

9 end 

10 tempi. closing{pending, fulfillments, violations) 

operations */ 

11 return violation, fulfillments 


/* Closing template 


• violations: this procedure is called for each event of the trace and is sup¬ 
posed to return the set of violations that have been observed so far; mod¬ 
ifications to the set of activations are allowed as well; 

• activation: this procedure is called for each event of the trace and is 
supposed to update the set of activations that have been observed so far 
(i.e., whether the current event is a new activation or not); 

• closing: this procedure is called once per trace, after all the events have 
been analyzed. 

In this paper, we illustrate the procedures for three templates, i.e., response, 
alternate response, and chain response. We consider these three specifications 
sufficiently representative in order to provide a clear idea of the capabilities 
of our frameworkj^ In each procedure, given the set of all possible activities 
A, we define a constraint as a tuple: c = {template. A, T,pa,Pc,g^T), where 
template indicates which template the constraint is referring to, template G 
{existence, absence, choice, responded existence,.. .}■, A C A is the nonempty 
set of activations; T C A is the nonempty set of targets; pa and pc indicate, 
respectively, the activation and the correlation condition; and pr represents the 
time condition. We also use the functions verify{pa, A), verify{pc. A, B), and 
verify{pr,A,B). The first function evaluates Pa with respect to the attributes 


^All the procedures for conformance checking based on MP-Declare have been implemented 
and are publicly available (see Section [^. 
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reported in A. The second function evaluates (pc with respect to the attributes 
defined in A and B. The third function compares the timestamps attached to A 
and B in order to see whether ipr is satisfied or not. As already mentioned, each 
event recorded in an event log brings a payload of attributes. In the description 
of the algorithms, we use the Tra{e) operator to get the value of an attribute 
a of an event e. For example, we use tt acUvityie-) to select the activity name 
associated to e. 

The first template we consider is response and the corresponding procedures 
are reported in Table The opening procedure does nothing. The fulfillment 
procedure checks whether the input event refers to a target. If this is the 
case, then all pending activations that can be correlated to this target (in case 
the time and the correlation conditions are satisfied) become fulfillments. The 
activation procedure checks whether the input event refers to an activation of 
the constraint and the activation condition ipa is satisfied (in this case the event 
has to be added to the set of pending activations). Violations are identified in 
the elosing procedure (the violation procedure is not used in this case). Here, 
all pending activations that do not have a corresponding target when the entire 
trace has been processed become violations. 


Response 


template. openingQ 

1 do nothing 


template.fulfillment{e, trace, pending, fulfillments, T, pa, Pc Pt) 


1 

2 

3 

4 

5 

6 

7 

8 


if activity i^e^ Cz T then 

foreach act £ pending do 

if verify{pc,act,e) and verify{ipr, act, e) then 
pending ^ pending \ { act] 
fulfillments fulfillments U {act} 
end 
end 
end 


template. violation{e, trace, pending, violations, T, PcPr) 

1 do nothing /* Actual violations are not identified here */ 


template.activation{e. A, pending, pa) 

1 if TVactivityie) G A and verify{pa,e) then 

2 I pending ^ pending U {e} 

3 end 


template. closing(pending, fulfillments, violations) 
1 foreach act G pending do 


2 

3 

4 end 


pending -h- pending \ {act} 
violations •<— violations U {act} 


Table 3: Procedure specifica|^ns for the response constraint. 













The procedures for the alternate response template are reported in Table 
In particular, opening defines a new data structure (possibleTargets) that will 
be used by the other procedures. The fulfillment procedure starts by checking 
whether the input event refers to an activation and the activation condition is 
satisfied. If this is the case, the procedure checks whether there is exactly one 
pending activation and at least one possible target. If this is the case, if for at 
least one possible target the time and the correlation conditions are satisfied, the 
pending activation becomes a fulfillment {fulfillment, lines [^8). If the activity 
referring to the input event is a target, the event is added to the set of possible 
targets {fulfillment, line 14). The violation procedure also starts by checking 
whether the input event refers to an activation and the activation condition 
is satisfied. If this is the case, the procedure checks whether there is exactly 
one pending activation. If this is the case, the pending activation becomes a 
violation (the pending activation cannot be a fulfillment because, in this case, 
the invocation of the fulfillment procedure moves it from the pending set to the 
fulfillment set). The activation procedure checks whether the input event refers 
to an activation and the activation condition is satisfied. In this case, the set 
of possible targets is reset to the empty value and the event is returned to be 
added to the set of pending activations. The closing procedure verifies that if 
there is a pending activation, this activation can be correlated at least to one 
possible target. If this is the case (if the time and the correlation conditions are 
satisfied), then the activation becomes a fulfillment {closing, line[^, otherwise 
it is marked as a violation {closing, line [IT]). 


Alternate Response 

template. openingQ 

1 define possibleTargets 0 as a data structure available throughout the entire 
CheckTraceConformance algorithm 


template.fulfillment{e, trace, pending, fulfillments, T, ipa, ‘fie, ipr) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


if T^activity{e) G A and verify{ipa,,e) then 

if \possibleTargets\ > 1 and \pending\ — 1 then 

act ■(— element G pending // There is only one element 
foreach p £ possibleTargets do 

if verify{(fic, act,p) and verify{ipT, act,p) then 
fulfillments fulfillments U {act} 
pending pending \ { act} 

break// It is possible to exit the loop 


end 

end 

end 

end 

if e G T then 

I possibleTargets <— possibleTargets U {e} 

end 
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Alternate Response (continued from previous page) 


template.violation{e, trace, pending, violations, T, ipc, 

1 if TVactivity{e) G A and verify{ifia,e) then 
if \pending\ — 1 then 

act <r- element £ pending // There is only one element 
pending •<— pending \ { act} 
violations violations U {act} 


2 

3 

4 

5 

6 

7 end 


end 


template.activation{e. A, pending, tpa) 

1 if TVactivity{e) G A and verify{ipa,e) then 


2 

3 

4 end 


possibleTargets •<— 0 
pending ^ pending U {e} 


template.closing{pending, fulfillments, violations) 

1 if \pendmg\ — 1 then 

2 targetFound •<— false 

3 act-<r- element £ pending // There is only one element 

4 foreach p G possibleTargets do 

5 if verify{ipc, act,p) and verify{ipT, act, p) then 

6 targetFound ■£- true 

7 fulfillments ■£- fulfillments U {act} 

8 end 

9 end 

10 if not targetFound then 

11 I violationsviolations U {act} 

12 end 


13 end 


Table 4: Procedure specifications for the alternate response constraint. 

The procedures for the chain response template are reported in Table 
As for the response template, opening does nothing. The fulfillment and the 
violation procedures verify whether there is exactly one element in the set of 
pending activations. In this case, they check whether the input event refers to a 
target and the time and correlation conditions are fulfilled. If this is the case, the 
pending activation becomes a fulfillment, otherwise it is marked as a violation. 
The activation procedure checks whether the input event refers to an activation 
and the activation condition is satisfied (in this case the event has to be added 
to the set of pending activations). The closing procedure checks whether there 
is still a pending activation when the entire trace has been processed. In this 
case, the pending activation becomes a violation. 
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Chain Response 

template. openingQ 

1 do nothing 


template.fulfillment{e, trace, pending, fulfillments, T, ipa, ipc, ‘fir) 

1 if \pendmg\ = 1 then 

act ■(— element £ pending // There is only one element 
if activity{e) £ T and verify{ipc, act,e) and verify{Lpr, act,e) then 
pending <r- pending \ {act} 
fulfillments fulfillments U {act} 

end 


2 

3 

4 

5 

6 

7 end 


template.violation{e, trace, pending, violations, T, 

1 if \pending\ = 1 then 

act<r- element £ pending // There is only one element 
if TTactivityie) ^ T OT not verify{(pc, act,e) or not verify{ipr, act,e) then 
pending •<— pending \ { act} 
violations violations U {act} 


end 


7 end 


template.activation{e. A, pending, tpa) 

1 if TVactivityie) G A and verify{(pa,e) then 

2 I pending ■(—pending \J {e} 

3 end 


template. closing{pending, fulfillments, violations) 
1 foreach act G pending do 


2 

3 

4 end 


pending pending \ { act} 
violations violations U {act} 


Table 5: Procedure specifications for the chain response constraint. 

The algorithms for the other templates specified in Tablej^can be very easily 
derived from the ones described in this section. In particular, the algorithms for 
the precedence, the alternate precedence and the chain precedence are the same 
as the ones described for response, alternate response and chain response re¬ 
spectively. The only difference is that, for the precedence templates, the traces 
in the input log have to be parsed from the end to the beginning. Similarly, 
the algorithms for checking the negative templates are the same as the ones 
described for the corresponding negative templates. In this case, every fulfill¬ 
ment for a positive template becomes a violation for the corresponding negative 
template and vice versa. 
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From the computational complexity point of view, it is worthwhile noting 
that the complexity of Algorithm and Algorithm is linear in the number of 
traces, constraints, and in the number of events of each trace. The complexity 
of the template-dependent procedures, instead, depends on the actual template. 
Specifically, with respect to the procedures of each constraint reported in this 
paper, we have the following complexities: 

• Response: opening, violation, and activation are constant; fulfillment and 
closing have linear complexity on the number of pending activations for 
the current trace (which is at most the number of events on the trace); 

• Alternate Response: opening, violation, and activation are constant; ful¬ 
fillment and closing are linear on the number of possible targets (which is 
at most the number of events on the trace); 

• Chain Response: opening, fulfillment, violation , activation are constant; 
closing is linear on the number of pending activations for the current trace 
(which is at most the number of events on the trace). 


6. Implementation and Benchmarks 

This section provides some details on the implementation of the approach 
and a benchmark analysis on different scenarios. 


6.1. Implementation Details 

The entire approach has been implemented as a plug-in of the process mining 
toolkit ProM[^ In particular, the plug-in receives as input an event log and a 
model and evaluates the conformance of the log with respect to the model. It is 
interesting to note that, in the current implementation, the processing of each 
trace is independent from all the others. Also, the analysis of a constraint in the 
reference model is independent from all the others. For this reason, it is possible 
to parallelize and distribute the analysis over different computational nodes and 
drastically improve the performances. The results of the tests reported in this 
paper, however, do not benefit from such a possibility and our tests sequentially 
evaluate each constraint on each trace. 

The conformance checking results are presented using a ProM plug-in called 
“Analysis Result Visualizer”. This visualizer is composed of three main win¬ 
dows. The first window consists of a summary of the statistics computed for 
each constraints (e.g., number of activations, number of violations and number 
of fulfillments) on the entire log. This window is shown in Figure]^ 

The second window (shown in Figure 5a I provides a more detailed view. 
This window is divided into three columns. The leftmost column contains a list 
of all the cases with information on case id, number of activations in the case, 
and number of fulfillments and violations. The central column contains the list 


®The software can be downloaded from http://www.pronitools.org/proni6 
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Figure 4: Overall details window with the result summary. 


of constraints in the reference model. When a case and a constraint are selected, 
in the list in the rightmost column of the window, a representation of the case 
appears. In this representation, each event is drawn as a rectangle. Green- 
painted rectangles represent fulhllments, red-painted boxes represent violations. 
It is possible to move the mouse cursor over each rectangle to see the complete 
set of attributes belonging to the event. 

The third window (shown in Figure [hb| ) also lists all cases. Here, each event 
of a case is represented as a small box that can be gray, green (in case the event 
is a fulfillment), or red (in case the event is a violation). This visualization is also 
called “birdview” since it provides a high-level overview of the constraints and 
allows the user to quickly identify possible issues. When the mouse is moved over 
an event, a pop-up showing the corresponding activity name appears. In both 
the second and the third window, it is possible to sort cases based on different 
parameters (name of the case, number of activations, number of violations, and 
number of fulfillments), or interactively search for cases with a specihc case id. 

6.2. Benchmarks 

In order to gain some insights on the computational feasibility of our imple¬ 
mentation, we run several tests in different possible scenarios. In particular, we 
tested our implementation against logs with different sizes and different trace 
lengths. We generated traces with 10, 20, 30, 40, and 50 events and, for each of 
these lengths, we generated logs with 25 000, 50 000, 75 000, and 100 000 traces. 
Therefore, in total, we used 20 logs. The number of events contained in each 
log is reported in Table In addition, we designed 10 Declare models. In 
particular, we prepared two models with 10 constraints, one only containing 
constraints on the control-flow (without conditions on data and time), and an¬ 
other one including real multi-perspective constraints (with conditions on time 
and data). We followed the same procedure to create models with 20, 30, 40, 
and 50 constraints. 
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(a) Window with the conformance checking details for a single case and constraint. 



(b) Birdview-like window showing an overview of fulfillments and violations for some cases. 

Figure 5: Windows used to inspect the conformance checking results by focusing 
on single cases. 


Number of log traces 



25 000 

50 000 

75 000 

100 000 

10 

250 000 

500000 

750 000 

1000 000 

20 

500 000 

100 0000 

1500 000 

2 000 000 

30 

750 000 

150 0000 

2 250 000 

3 000 000 

40 

100 0000 

200 0000 

3 000 000 

4000 000 

50 

125 0000 

250 0000 

3 750 000 

5 000 000 


Table 6: Number of events for each log. 
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We checked each log against each model, and we repeated the procedure five 
times, in order to get the average execution times for each configuration. To 
provide more accurate results, the times reported here are measured without 
considering the time needed to generate the graphical visualization (we perform 
the tests on a custom command-line version of ProM). All tests have been 
performed using two machines (part of a cluster) randomly, with the following 
hardware conhgurations: (i) A x Eight-Core Intel(R) Xeon(R) CPU E5-4640 0 
@ 2.40GHz; (ii) 2 x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz. 

Figure [^provides a graphical representation of the average execution times 
for the analysis of all models and logs. In particular, the graph on top reports 
the execution times using models with control-flow based constraints. The graph 
at the bottom reports the execution times using real multi-perspective models 
(with conditions on time and data). In Figure]^ and in Figure]^ we also report 
the average execution times (and standard deviations) required to analyze all 
models and logs but we provide different views on the data. In particular, in 
Figure the execution times are grouped based on the number of traces in the 
logs. The graph on the left-hand side reports the execution times using models 
with control-flow based constraints, the one on the right-hand side reports the 
execution times using multi-perspective constraints. In Figure the execution 
times are grouped based on the number of events in each trace. 

As the statistics clearly show, the time required to perform the analysis 
directly depends both on the number of events in each trace, and on the actual 
size of the log. However, the execution times evaluated using models with 
control-flow based constraints seem to be more influenced by the number of 
events in each traces. We believe that this is due to the additional costs needed 
for starting up the data validation engine in case of multi-perspective models. In 
particular, it is necessary to restart such engine for each trace and the additional 
time required is so high that it becomes impossible to see the differences in 
terms of performances for traces of different lengths. In general, it is worthwhile 
noting that the most expensive configuration (a model with 50 multi-perspective 
constraints, and a log with 100 000 traces and 5 000 000 events) requires, on 
average, 255 369 milliseconds, i.e., about 4.2 minutes. This proves the scalability 
of our approach. 

7. Case Studies 

This section provides three case studies on real datasets. The first one is 
based on an event log provided by an academic hospital, the second one is a 
case study provided by a financial institution and the third one is based on a 
dataset provided by a bank. 

7.1. A Large Academic Hospital 

We have conducted a case study by using the BPI challenge 2011 event log 
[55] . This log pertains to a healthcare process and, in particular, contains the 
executions of a process related to the treatment of patients diagnosed with can¬ 
cer in a large Dutch academic hospital. The whole event log contains 1143 cases 
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Figure 6: Execution times in milliseconds required to process logs with different 
number of traces of different lengths. The plot on top refers to models with 
control-flow constraints. The plot at the bottom refers to models with control- 
flow, data and time constraints. 

Table 7: Reference constraints used to analyze the log from the BPI challenge 
2011 . 


Id 

Constraint 

1st param. 2nd param. 

Activation Correlation 

condition condition 

Time 

condition 

1 

Precedence 

ca-125 using outpatient 

meia follow-up 

consultation 

A.Diagnosis == 

’maligniteit 
ovarium or tuba' 

0,15,d 

2 

Precedence 

First telephone 

outpatient consultation 

consultation 

A.orgtgroup == 
T.org;group 
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Different model sizes 


Different model {also with data and time constraints) sizes 


Logs w/ 25000 traces Logs w/ 75000 traces 

Logs w/ 50000 traces Logs w/ 100000 traces i— —i 


Logs w/ 25000 traces Logs w/ 75000 traces ^^B 

Logs w/ 50000 traces ^^B Logs w/100000 traces I I 


Figure 7: Execution times in milliseconds grouped based on the number of traces 
in the logs. The plot on the left hand side refers to models with control-flow 
constraints. The plot on the right hand side refers to models with control-flow, 
data and time constraints. 



10 constr. 20 constr. 30 constr. 40 constr. 50 constr. 


Different model sizes 


Traces w/10 events I 
Traces w/ 20 events I 
Traces w/ 30 events I 


Traces w/40 events I 
Traces w/50 events I 
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Different model {also with data and time constraints) sizes 


Traces w/10 events I 
Traces w/ 20 events I 
Traces w/ 30 events I 


Traces w/40 events I 
Traces w/50 events I 


Figure 8: Execution times in milliseconds grouped based on the number of events 
in each trace. The plot on the left hand side refers to models with control-flow 
constraints. The plot on the right hand side refers to models with control-flow, 
data and time constraints. 


Table 8: Conformance checking results using the log from the BPI challenge 
2011 . 


Id 

Act.no. 

Viol.no. 

Fulfill.no. 

Avg. act.sparsity 

Avg.viol.ratio 

Avg.fulfill.ratio 

1 

343 

242 

101 

0.9844 

0.7055 

0.2945 

2 

1286 

546 

740 

0.9677 

0.4246 

0.5754 


Table 9: Execution times using the log from the BPI challenge 2011. 


Id 

Avg.execution time (milliseconds) 

1 

1759 

2 

1828 
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Activations: 11 Act sparsity: 0^75 

Futfilments: 0 Fulfilment ratio: 0 

Violations: 11 Violation ratio: 1 



Figure 9: Example of violations for constraint 1. 


and 150 291 events distributed across 623 event classes (i.e., each event refers 
to one of 623 different possible activities). Each case describes the treatment 
of a different patient. The event log contains domain specific attributes that 
are both case attributes and event attributes in addition to the standard XES 
attributes. For example, Age, Diagnosis, and Treatment code are case at¬ 
tributes and Activity code. Number of executions. Specialism code, and 
Group are event attributes. As mentioned in Section 3.1 in our analysis all the 
attributes are considered visible for all the activities and we suppose that an 
activity overwrites the old values of all the event attributes attached to it. 

To investigate the behavior of the process as recorded in the log, we have 
used the constraints shown in Table [TJ The idea behind constraint 1 is that 
the tumor marker “ca-125” is used in the follow-up of patients diagnosed with 
ovarian cancer as an indicator of the evolution of the tumor. For this reason, 
we would expect that, if the diagnosis for a patient is “maligniteit ovarium”, 
the follow-up consultation is preceded by the analysis of this tumor marker. In 
addition, we require a time condition indicating that this analysis should not 
come too early with respect to the follow-up. As shown in Tableconstraint 1 
has 343 activations. This means that there are 343 occurrences of outpatient 
follow-up consultation associated with a Diagnosis equal to maligniteit 
ovarium or tuba. As shown in Table around 70% of these activations are 
violations. One of the reasons why there are so many violations in the log 
for this constraint is that there can be several follow-ups in a case and some 
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of them are not correlated with the “ca-125” test but with other tests. In 
Figure it is possible to see some violations for constraint 1. For example, 
the selected event outpatient follow-up consultation is an activation for 
the constraint since, in its payload, the value for Diagnosis is maligniteit 
ovarium or tuba. However, this activation is probably connected with the 
computed tomography abdomen and/or the ultrasound test done immediately 
before. 

The idea behind constraint 2 is that the first consultation for a patient in the 
hospital cannot be a telephone consultation. We also add a correlation condition 
to understand if every telephone consultation is preceded by a first consultation 
in the same department. There is no activation condition for this constraint. 
This means that every time telephone consultation occurs, the constraint is 
activated. The constraint has 1 286 activations. Around 42% of these activations 
are violations. Some of these violations are due to the occurrence of telephone 
consultations preceded by a first consultation in a different department. In 
addition, it is also worth to highlight that the log we are using for this case 
study is an excerpt derived from a larger log and it contains several cases that 
are truncated both at the beginning and at the end. This can be also the reason 
of violations for this constraint. 

In Table we show the execution times needed for checking the constraints 
in this case studyjg 0 For each of them, the execution time is lower that 2 
seconds. This confirms that the scalability of our tool. 

1.2. A Dutch Financial Institution 

The second case study we discuss is based on the application of the proposed 
approach to the event log provided for the BPI challenge 2012 and taken from a 
Dutch financial institute [ID]. The event log pertains to an application process 
for personal loans or overdrafts. It contains 262 200 events distributed across 36 
event classes and includes 13 087 cases. The amount requested by the customer 
is indicated in the case attribute AMOUNT_REQ. In addition, the log contains the 
standard XES attributes for events. 

For this case study, we have used the constraints shown in Table [TOj Some of 
these constraints involve some specific transactional states (a.k.a. event types) 
of an activity. For example, the parameters specified for constraint 7-10 are 
W_Valideren aanvraag-SCHEDULE and W_Valideren aanvraag-START. When 
an event type is not specified, like in the case of constraint 3-6, the event type 
considered by default is “complete”. 

With constraint 3, we want to understand how many submitted applications 
are eventually accepted. As shown in Table [g there are 13 087 submissions of 
which only 5 113 are eventually accepted (around 39%). Using constraint 4, we 


^The execution times in all the tables of this section are averaged over 5 runs. 

^All the experiments described in this section have been performed on a machine with an 
Intel{R) Core(TM) i7-2670QM CPU @ 2.20GHz (limiting the execution to just one core), 8 
GB of RAM and the Oracle Java virtual machine installed on a GNU/Linux Ubuntu operating 
system. 
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can understand that the majority of these accepted applications (around 79%) 
are accepted in less than 24 hours from the submission. Using constraints 5 
and 6, we can understand how the requested amount affects the application. In 


Table 10: Reference constraints used to analyze the log from the BPI challenge 
2012 . 


Id 

Constraint 

1st param. 

2nd param. 

Activation 

condition 

Correlation 

condition 

Time 

condition 

3 

Response 

A_SUBMITTED 

A_ACCEPTED 


- 


4 

Response 

A_SUBMITTED 

A_ACCEPTED 


- 

0,24,h 

5 

Response 

A_SUBMITTED 

A_ACCEPTED 

A.AMOUNT_REq 
>= 10 000 

- 

- 

6 

Response 

A_SUBMITTED 

A_ACCEPTED 

A.AM0UNT_REq 
< 10 000 

- 

- 

7 

Response 

W_Valideren 

aanvraag-SCHEDULE 

W_ValidGren 

aanvraag-START 

- 

- 

- 

8 

Response 

W.Valideren 

aanvraag-SCHEDULE 

W.Valideren 

aanvraag-START 


A.org:resource 
! = 

T. org:resource 


9 

Response 

W_Valideren 

aanvraag-SCHEDULE 

W_ValidGren 

aanvraag-START 


A.org:resource 

j = 

T.org:resource 

0,7,d 

10 

Response 

W_Valideren 

aanvraag-SCHEDULE 

W_ValidGren 

aanvraag-START 


A.org:resource 
! = 

T. org:resource 

0,24,h 

11 

Response 

W_Valideren 

aanvraag-START 

W_ValidGren 

aanvraag-COMPLETE 

- 

- 

- 

12 

Response 

W_Valideren 

aanvraag-START 

W_ValidGren 

aanvraag-COMPLETE 


A.org:resource 

T. org:resource 


13 

Response 

W.Valideren 

aanvraag-START 

W.Valideren 

aanvraag-COMPLETE 


A.org:resource 

T. org:resource 

0,1,h 

14 

Response 

W_Valideren 

aanvraag-START 

W_ValidGren 

aanvraag-COMPLETE 


A.org:resource 

T. org:resource 

0,15,m 


Table 11: Conformance checking results using the log from the BPI challenge 
2012 . 


Id 

Act.no. 

Viol.no. 

Fulfill.no. 

A vg. act .sparsity 

Avg.viol.ratio 

Avg.fulfill.ratio 

3 

13087 

7974 

5113 

0.8596 

0.6093 

0.3907 

4 

13087 

9036 

4 051 

0.8596 

0.6905 

0.3095 

5 

6 847 

3601 

3 246 

0.9585 

0.5259 

0.4741 

6 

6 240 

4373 

1867 

0.9211 

0.7008 

0.2992 

7 

5023 

51 

4 972 

0.9909 

0.0102 

0.9898 

8 

5023 

236 

4 787 

0.9909 

0.047 

0.953 

9 

5023 

263 

4 760 

0.9909 

0.0524 

0.9476 

10 

5023 

2 897 

2126 

0.9909 

0.5767 

0.4233 

11 

7891 

2 

7889 

0.9863 

0.0003 

0.9997 

12 

7891 

6 

7885 

0.9863 

0.0008 

0.9992 

13 

7891 

228 

7663 

0.9863 

0.0289 

0.9711 

14 

7891 

3355 

4 536 

0.9863 

0.4252 

0.5748 
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particular, when the requested amount is lower than 10 000 the acceptance rate 
is almost 30%. The acceptance rate is higher if the requested amount is greater 
or equal to 10 000 (almost half of the applications is accepted in this case). 

With constraints 7-14, we analyze the validation of the applications. With 
constraint 7, we can see that almost 99% of the scheduled validations are even¬ 
tually started. In 95% of the cases, the resource that schedules the validation is 
not the same resource that starts this activity (see constraint 8). In addition, 
in around 94% of the cases, a scheduled validation is started within 7 days from 
the scheduling (constraint 9) and in almost half of the cases the validation is 
started only 24 hours after the scheduling. Constraint 11 indicates that almost 
100% of the validations that have been started are also completed, and almost 
in all the cases the resource that starts the validation is the same resource that 


Table 12: Execution times using the log from the BPI challenge 2012. 


Id 1 

Avg.execution time (milliseconds) 

3 1 

' 2 772 

4 1 

3 220 

5 

3 261 

6 

3 205 

7 

3196 

8 

3100 

9 1 

3212 

10 

3146 

11 

2176 

12 

3210 

13 

3 241 

14 1 

3 258 



(a) Example of fulfillment W_Valideren (b) A correlated target W.Valideren 
aanvraag-START at position 35. aanvraag-CDMPLETE at position 36 executed 

by the same resource. 

Figure 10: Example of fulfillment for constraint 13. 
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(a) Example of violation W.Valideren (b) A possible target W.Valideren 
aanvraag-START at position 37. acuivraag-CQMPLETE occurs more than 1 

hours after. 

Figure 11: Example of violation for constraint 13; W.Valideren 

acLnvraag-COMPLETE occurs outside the required time interval (too late). 



(a) Example of fulfillment W_Valideren (b) Corresponding target executed by the 
aanvraag-START at position 39. same resource. 


Figure 12: Example of fulfillment for constraint 13; 
aatnvraag-START at position 39 is followed by 
actnvraag-COMPLETE within the required time interval. 


W.Valideren 

W.Valideren 
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Table 13: Reference constraints used to analyze the log from the BPI challenge 
2014. 


Id 

Constraint 

1st param. 

2nd param. 

Activation 

condition 

Correlation 

condition 

Time 

condition 

15 

Not response 

Open 

Reopen 


- 


16 

Not response 

Open 

Reopen 


A.org:resource 

I = 

T.org:resource 


17 

Response 

Open 

Closed 


- 


18 

Response 

Open 

Closed 


- 

0,12,h 

19 

Response 

Open 

Closed 

A.KMnumber == 

'KM0000611’ 

- 

0,12,h 

20 

Response 

Open 

Closed 

A.KMnumber == 

'KM0002043' 

- 

0,12,h 


Table 14: Conformance checking results using the log from the BPI challenge 
2014. 


Id 1 

Act.no. 

Viol.no. 

Fulfill.no. 

A vg. act .sparsity 

Avg.viol.ratio 

Avg.fulfill.ratio 

15 j 

1 46 607 

2121 

44486 I 

j 0.8468 

0.0455 

0.9545 

16 

46607 

510 

46 097 

0.8468 

0.0109 

0.9891 

17 1 

46607 

449 

46158 1 

0.8468 

0.0096 

0.9904 

18 

46607 

24392 

22215 1 

0.8468 

0.5234 

0.4766 

19 

' 446 

386 

60 

0.9993 

0.8655 

0.1345 

20 I 

1 773 

48 

725 1 

[ 0.9969 

0.0621 

0.9379 


completes this activity (see constraint 12). In 97% of the cases, the validation 
is done in at most 1 hour (constraint 13), and in more than half of the cases it 
is completed in less than 15 minutes (constraint 14). 

In Figure 10 and we show two fulfillments for constraint 13 (the activa¬ 
tions with the correlated targets). [T^ shows a violation for the same constraint. 
In Table [T^ we show the execution times needed for checking the constraints 
in this case study. Also in this case, like in the first case study here presented, 
the execution time is low (between 2 and 3 seconds on average). 


1.3. Rabobank 

The case study we illustrate in this section has been provided for the BPI 
challenge 2014 by Rabobank Netherlands Group ICT m- The log we use 
pertains to the management of calls or mails from customers to the Service 
Desk concerning disruptions of ICT-services. The log contains 46 616 cases. 


Table 15: Execution times using the log from the BPI challenge 2014. 


Id 1 

Avg.execution time (milliseconds) 

15 1 

4294 

16 

5 093 

17 1 

5 240 

18 

5 055 

19 

4861 

20 1 

5 398 
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(a) Example of violation Open at position 1. (b) A forbidden event Reopen occurs after 

Open. 


Figure 13: Example of violation for constraint 16; Open is followed by an event 
Reopen associated to a different resource. 


466 737 events referring to 39 different event classes. There are 242 origina¬ 
tors and domain specific event attributes like KM number, Interaction ID and 
IncidentActivity_Number. For this case study, we have used the constraints 
shown in Table ITSl 

As shown in Table [M} constraint 15 has 46 607 activations and 44486 ful¬ 
fillments. This allows us to understand that in around 95% of open calls are 
not reopened afterwards. This percentage is even higher if we require that an 
open call cannot be eventually reopened by the same resource (see constraint 
16). Indeed, this is true in almost 99% of the cases. 

Around 99% of the open calls are eventually closed (see constraint 17). 
Around half of them are closed within 12 hours (constraint 18). The “KM 
number” in this case study identifies the characteristics of a call to understand 
how urgent the corresponding problem is. The checks on rules 19 and 20 show 
that the calls corresponding to the number KM0002043 are, in general, more 
urgent than the ones corresponding to the number KM0000611. Indeed, over 
446 calls corresponding to the KM number KM0000611 only 60 are closed within 
12 hours. On the other hand, over 773 calls corresponding to the KM number 
KM0002043, 725 are closed within 12 hours. 

Figurej^shows a violation for constraint 16. The selected event Open is fol- 
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lowed by a forbidden event Reopen (associated to a different resource). Table 15 
shows that the execution times for this case study range from 4 to 5 seconds. 


8. Conclusion and Future Work 

In this work, we propose a framework for checking the conformance of event 
logs with respect to MP-Declare models. MP-Declare is an extension of the 
declarative process modeling language Declare that allows the modeler to spec¬ 
ify constraints over the data associated to the control-flow and over the “time 
dimension” of a business process. We describe and discuss in detail how the 
proposed framework can be used to define algorithms for conformance checking 
based on MP-Declare. Our proposal has been implemented in the process min¬ 
ing tool ProM. The implemented software covers the entire set of MP-Declare 
templates. In addition, the conformance checker can also be used with standard 
Declare. A wide experimentation has been carried out using both real-life and 
synthetic logs. These case studies prove the applicability of our implementation 
in realistic settings. Although it is extremely important to recognize deviances 
a-posteriori, in some particular contexts, it would be also useful to detect vio¬ 
lations on-the-fly as they occur. To this aim, in the near future we are planning 
to make the proposed framework suitable to be used in online settings. 
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